Rich Parameterization Improves RNA Structure Prediction

نویسندگان

  • Shay Zakov
  • Yoav Goldberg
  • Michael Elhadad
  • Michal Ziv-Ukelson
چکیده

Current approaches to RNA structure prediction range from physics-based methods, which rely on thousands of experimentally measured thermodynamic parameters, to machine-learning (ML) techniques. While the methods for parameter estimation are successfully shifting toward ML-based approaches, the model parameterizations so far remained fairly constant. We study the potential contribution of increasing the amount of information utilized by RNA folding prediction models to the improvement of their prediction quality. This is achieved by proposing novel models, which refine previous ones by examining more types of structural elements, and larger sequential contexts for these elements. Our proposed fine-grained models are made practical thanks to the availability of large training sets, advances in machine-learning, and recent accelerations to RNA folding algorithms. We show that the application of more detailed models indeed improves prediction quality, while the corresponding running time of the folding algorithm remains fast. An additional important outcome of this experiment is a new RNA folding prediction model (coupled with a freely available implementation), which results in a significantly higher prediction quality than that of previous models. This final model has about 70,000 free parameters, several orders of magnitude more than previous models. Being trained and tested over the same comprehensive data sets, our model achieves a score of 84% according to the F₁-measure over correctly-predicted base-pairs (i.e., 16% error rate), compared to the previously best reported score of 70% (i.e., 30% error rate). That is, the new model yields an error reduction of about 50%. Trained models and source code are available at www.cs.bgu.ac.il/?negevcb/contextfold.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Supporting Material for the Paper: Rich Parameterization Improves RNA Structure Prediction

We specify here the features which are used by our models. Each feature description is composed of two parts: a description of a structural element, and a (possibly empty) description of a sequential context. All models discussed in the paper are obtained by combining a set of structural elements St with a set of sequential contexts Co, and producing all corresponding features (i.e. producing a...

متن کامل

The four ingredients of single-sequence RNA secondary structure prediction. A unifying perspective

Any method for RNA secondary structure prediction is determined by four ingredients. The architecture is the choice of features implemented by the model (such as stacked basepairs, loop length distributions, etc.). The architecture determines the number of parameters in the model. The scoring scheme is the nature of those parameters (whether thermodynamic, probabilistic, or weights). The parame...

متن کامل

PreRkTAG: Prediction of RNA Knotted Structures Using Tree Adjoining Grammars

Background: RNA molecules play many important regulatory, catalytic and structural <span style="font-variant: normal; font-style: norma...

متن کامل

Parameterization of middle atmospheric water vapor photochemistry for high-altitude NWP and data assimilation

This paper describes CHEM2D-H2O, a new parameterization of H2O photochemical production and loss based on the CHEM2D photochemical-transport model of the middle atmosphere. This parameterization accounts for the altitude, latitude, and seasonal variations in the photochemical sources and sinks of water vapor over the pressure region from 100–0.001 hPa (∼16–90 km altitude). A series of free-runn...

متن کامل

Evolutionary Algorithm for RNA Secondary Structure Prediction Based on Simulated SHAPE Data

BACKGROUND Non-coding RNAs perform a wide range of functions inside the living cells that are related to their structures. Several algorithms have been proposed to predict RNA secondary structure based on minimum free energy. Low prediction accuracy of these algorithms indicates that free energy alone is not sufficient to predict the functional secondary structure. Recently, the obtained inform...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of computational biology : a journal of computational molecular cell biology

دوره 18 11  شماره 

صفحات  -

تاریخ انتشار 2011